Introduction

Source: https://www.expertafrica.com/wildlife/spotted-hyena/africa

Today, we will look at the distribution and environmental suitability of Spotted Hyenas. The occurrence is downloaded from GBIF. The environmental variables are bioclimatic variables:

Bioclimatic variables
code name
BIO1 Annual Mean Temperature
BIO2 Mean Diurnal Range (Mean of monthly (max temp - min temp))
BIO3 Isothermality (BIO2/BIO7) (×100)
BIO4 Temperature Seasonality (standard deviation ×100)
BIO5 Max Temperature of Warmest Month
BIO6 Min Temperature of Coldest Month
BIO7 Temperature Annual Range (BIO5-BIO6)
BIO8 Mean Temperature of Wettest Quarter
BIO9 Mean Temperature of Driest Quarter
BIO10 Mean Temperature of Warmest Quarter
BIO11 Mean Temperature of Coldest Quarter
BIO12 Annual Precipitation
BIO13 Precipitation of Wettest Month
BIO14 Precipitation of Driest Month
BIO15 Precipitation Seasonality (Coefficient of Variation)
BIO16 Precipitation of Wettest Quarter
BIO17 Precipitation of Driest Quarter
BIO18 Precipitation of Warmest Quarter
BIO19 Precipitation of Coldest Quarter

The habitat of Spotted Hyenas is mainly sub-Saharan Africa.

Because we didn’t learn spatial analysis yet, we will focus on non-spatial analysis today:

Data manipulation

Tasks:

Hint: we could directly use URL as the path to read tables.

library(readr)
library(dplyr)
library(passport)

# Read occurrence
occurrence <- read.csv('https://agroimpacts.github.io/geospaar/hyenas_occurrence.csv', 
                       stringsAsFactors = F) %>% 
  dplyr::select(gbifID, species, countryCode, decimalLatitude, 
         decimalLongitude, month, year) %>% 
  mutate(country = as_country_name(countryCode))

# Read variables
variables <- read_csv('https://agroimpacts.github.io/geospaar/hyenas_variables.csv') %>% 
  dplyr::select(gbifID,
         wc2.1_2.5m_bio_1, wc2.1_2.5m_bio_4,
         wc2.1_2.5m_bio_5, wc2.1_2.5m_bio_6,
         wc2.1_2.5m_bio_12, wc2.1_2.5m_bio_13,
         wc2.1_2.5m_bio_14, wc2.1_2.5m_bio_15) 
names(variables) <- c('gbifID', 
                      paste0('bio', c(1, 4:6, 12:15)))

# Join two tables, remove NAs and duplicates
occ_var <- inner_join(occurrence, variables) %>% 
  na.omit() %>% distinct()
head(occ_var)
##       gbifID         species countryCode decimalLatitude decimalLongitude month
## 1 3044961479 Crocuta crocuta          TZ       -2.333333         34.83333    12
## 2 3044944191 Crocuta crocuta          ZA      -28.037928         32.16869    10
## 3 3044939388 Crocuta crocuta          KE       -0.345021         36.81288     2
## 4 3044813070 Crocuta crocuta          KE       -2.637597         37.24065     9
## 5 3044804214 Crocuta crocuta          ZA      -24.116664         31.63650     9
## 6 3044803667 Crocuta crocuta          ZA      -24.605932         31.73148    12
##   year      country     bio1      bio4   bio5   bio6 bio12 bio13 bio14    bio15
## 1 2007     Tanzania 20.98183  84.84062 28.240 12.684   827   132    19 54.41696
## 2 2007 South Africa 21.26133 259.94644 29.528 11.228   941   141    29 47.39932
## 3 2021        Kenya 14.25450  76.70544 22.020  6.976  1260   227    53 54.45908
## 4 2017        Kenya 21.76983 146.04869 31.040 13.460   634   138     3 83.16368
## 5 2007 South Africa 22.77200 318.79556 32.340  9.424   511    94     5 80.18640
## 6 2020 South Africa 21.68317 311.25281 30.928  8.560   601   106     8 76.95921

Exploratory analysis

Bias of observations

First, let’s have a look at the potential bias of observations. One hypothesis:

  • People have higher chance to observe Hyenas with more out-door activities.

Month

Tasks:

  • Make a histogram of occurrence count for each month by filling the ___.
library(ggplot2)
ggplot(occ_var) + 
  geom_histogram(aes(x = as.factor(month)), 
           stat = 'count', 
           color = 'transparent',
           fill = 'blue') +
  labs(x = 'Month', y = 'Count') +
  theme_bw()

As we could see from the figure that there are more observations during July to November. This might because the weather is nicer for people to come out to meet Hyenas. Or it might because Hyenas have more activities during these months. So if we want to fit a model or so, it might be good to resample the observations to make sure they distribute evenly across each month.

There might be some similar bias of observation spatially, e.g. distance to roads. We will look at in unit2.

Year

Tasks:

  • Make a histogram of occurrence count for each year from scratch.
occ_var %>% filter(year >= 2000) %>% 
  ggplot() + 
  geom_histogram(aes(x = as.factor(year)),
                 stat = 'count',
                 color = 'transparent',
                 fill = 'red') +
  labs(x = 'Year', y = 'Count') +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

2020 is very interesting that might relate to the outbreak of COVID.

Spatial distribution

Distribution by countries

Tasks:

  • Make a barplot for each country, and fill the bar of each country by month.
  • Fill the ___ to make the code work.
occ_var %>%
  ggplot(aes(forcats::fct_infreq(country))) + 
  geom_bar(aes(fill = as.factor(month)), 
           color = 'transparent') +
  labs(x = 'Country', y = 'Count') +
  scale_fill_brewer(palette = "Paired", name = 'Month') +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Randomly select 5 years to plot

Since we don’t learn spatial analysis yet. So now we just plot coordinates in Cartesian coordinate system like a regular plot.

Task:

  • Subset the occ_var with year >= 2000.
  • Randomly select 6 years out with seed 10. Hint: use sample function and %in% operator.
  • Make a scatterplot with decimalLongitude as x and decimalLatitude as y. And let the scatterplot has different color for each year.
  • Customize the color of years.
  • Reset x and y axis text.
  • Set a theme for your figure.
set.seed(10)
occ_var %>% 
  filter(year >= 2000) %>% 
  filter(year %in% sample(unique(year), 6)) %>% 
  ggplot() +
  geom_point(aes(x = decimalLongitude, 
                 y = decimalLatitude, 
                 col = as.factor(year)),
             size = 2) +
  scale_color_brewer(palette = "Dark2", name = 'Year') +
  labs(x = 'Longitude', y = 'Latitude') +
  theme_bw()

Make animations by year

Now let’s learn how to make an animated figure.

library(magick)
library(gganimate)
p1 <- occ_var %>% 
  filter(year >= 2000) %>% 
  mutate(year = as.integer(year)) %>% 
  ggplot() +
  geom_point(aes(x = decimalLongitude, 
                 y = decimalLatitude),
             size = 2, color = 'red') +
  scale_color_brewer(palette = "Dark2", name = 'Year') +
  labs(x = 'Longitude', y = 'Latitude', title = 'Year: {frame_time}') +
  transition_time(year) +
  theme_bw()

animate(p1, renderer = magick_renderer(), 
        fps = 5, height = 3.5, width = 3.5, 
        units = "in", res = 150)

Environmental suitability

Tasks:

  • Just select out bio columns.
  • Calculate the mean of these variables.
  • Do whatever necessary to put the results into a table with one column Variable of variable names and another column Mean of mean values.
  • Use kableExtra to show the table.
library(kableExtra)
occ_var %>% 
  dplyr::select(contains('bio')) %>% 
  apply(2, mean) %>% data.frame(Mean = .) %>% 
  mutate(Variable = rownames(.)) %>% 
  dplyr::select(Variable, Mean) %>% 
  kbl(row.names = F, caption = 'Variable mean') %>% 
  kable_paper("striped", 
              html_font = 'helvetica',
              font_size = 12,
              full_width = F)
Variable mean
Variable Mean
bio1 21.52944
bio4 210.89461
bio5 30.66609
bio6 10.88713
bio12 769.73049
bio13 147.28969
bio14 11.33184
bio15 76.49962

Tasks:

  • Just select out bio columns.
  • Expand the table using pivot_longer to put the names of variables to column var, and values of variables to column value.
  • Then add a column group: if bio1, bio4, bio5, has value “Temperature”, if not, has value “Rainfall”. Hint: use ifelse.
  • Make boxplot of var and value, and make facets based on group.
  • Set necessary axis text and theme.
library(ggpubr)
library(tidyr)
occ_var %>% 
  dplyr::select(contains('bio')) %>% 
  pivot_longer(cols = contains('bio'),
               names_to = 'var', values_to = 'value') %>% 
  mutate(group = ifelse(var %in% paste0('bio', 1:6), 
                        'Temperature', 'Rainfall')) %>% 
  ggplot() +
  facet_wrap(~ group, scale = 'free') +
  labs(x = 'Environmental variable', y = 'Value') +
  geom_boxplot(aes(x = var, y = value)) +
  theme_cleveland()

Ecological niche

Tasks:

  • Make a scatterplot of bio1 (x) and bio12 (y).
  • Set x and y axis text based on Bioclimatic variables code table.
  • Add two horizontal lines of linetype 2 and color red with intercept corresponding to the min and max value of bio12. Hint: use geom_hline.
  • Add two vertical lines of linetype 2 and color with intercept corresponding to the min and max value of bio1. Hint: use geom_vline.
  • Set a theme for your figure.
occ_var %>% 
  ggplot(aes(x = bio1, y = bio12)) +
  geom_point(col = 'blue', size = 2) +
  labs(x = 'Annual mean temperature', 
       y = 'Annual mean rainfall') +
  geom_vline(xintercept = min(occ_var$bio1), 
             col = 'red', linetype = 2) +
  geom_vline(xintercept = max(occ_var$bio1), 
             col = 'red', linetype = 2) +
  geom_hline(yintercept = min(occ_var$bio12), 
             col = 'red', linetype = 2) +
  geom_hline(yintercept = max(occ_var$bio12), 
             col = 'red', linetype = 2) +
  theme_light()